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The fuzzy integral has been shown to be an effective tool for the aggregation of evidence in 
decision making. Of primary importance in the development of a fuzzy integral pattern 
recognition algorithm is the choice (construction) of the measure which embodies the 
importance of subsets of sources of evidence. Sugeno fuzzy measures have received the most 
attention due to the recursive nature of the fabrication of the measure on nested sequences of 
subsets. Possibility measures exhibit an even simpler generation capability, but usually 
require that one of the sources of information possess complete credibility. In real 
applications, such normalization may not be possible, or even desirable. In this report both 
the theory and a decision making algorithm for a variation of the fuzzy integral are presented. 
This integral is based on a possibility measure where it is not required that the measure of the 
universe be unity. A training algorithm for the possibility densities in a pattern recognition 
application is also presented with the results demonstrated on the shuttle-earth-space training 
and testing images. 


1. Introduction 

Decision making is a basic problem in science, engineering, and even in daily life. There 
are often conflicting requirements of low error rates and minimum computation time to 
reduce the cost. The purpose of this paper is to propose the concept of possibility expectation 
via the possibility integral as a decision making scheme, which can be used to construct 
optimal decision making algorithms. A possibility expectation is a value of nonlinear 
integration of two pieces of information, namely, an evidence function h(x) and a possibility 
measure Pos(*). A possibility measure is a monotonic set function with the property that the 
measure of the universe X can be less than or equal to unity. 

An example of possibility expectation Is the following: In the court room, although the 
witnesses for both the defendant and plaintiff promise that they will tell the truth, the judge 
still needs to assign the grade of credibility (possibility densities) to each person to evaluate 
what the person says (evidence). The judge will integrate what each group of witnesses said with 
his belief in that group’s credibility (possibility measure). Then the Judge makes his decision 
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(possibility expectation). 

In multicriteria decision making, as can be found in most pattern recognition problems, 
the value of each source of information (and thus all subsets of sources) toward each 
alternative can be different. For example, "greenness” may be a very important feature for 
recognizing certain types of trees in an image; whereas it may be quite unimportant as a feature 
for a roof of a building. This difference in the importance or credibility of subsets of 
information sources will be encoded in a possibility measure. The degree to which a given 
image region is green, to continue the example, is objective evidence supplied by the 
information source. After collecting all such objective information, it is the job of the decision 
making algorithm to fuse the objective evidence together with the worth of the sources. In our 
methodology, this will be accomplished by utilizing the possibility integral, a variation of the 
fuzzy integral [1J. 

The particular possibility measures which we describe generalize fuzzy measures in that it 
is not required that the measure of the entire domain of discourse be one. In a pattern 
recognition problem, it may not be possible, or may not be desirable to force one of the sources 
of information to have "perfect credibility". By relaxing this requirement, not only do we 
match real situations better, we also provide the opportunity to create better decision making 
algorithms, as we shall see later. 

For a pattern recognition environment, a method to learn the possibility densities (values 
upon which the measure is generated) from training data is given. The results of the 
subsequent algorithm are used to segment a shuttle from the earth and space backgroud. 


2. Possibility Measures and Possibility Integral 


Definition 2.1 A set function Pos(-); 2* — »[0, 1] is called a possibility measure if it satisfies the 
following properties: 

(1) Pos(0) = 0. FosCQ < 1. 

(2) IfA,Be2 x andAc B. then Pos(A) < Pos(B). 

(3) Pos( UA. ) = sup [ Pos( A. ) ]. 

J *|1.dJ J 

Note: If X is finite, a possibility measure is not a fuzzy measure when Pos(X) < 1 ; it is the 
same as fuzzy measure only when Pos(X) = 1 . If X is infinite, a possibility measure is not a fuzzy 
measure in general (2). Puri and Ralescu [31 give two counterexamples which show that, even in 
"nice” cases, a possibility measure is not a fuzzy measure in the infinite case. 
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Definition 2.2 Let X = { Xj I j = 1 , n } be a finite set and let Pos be a possibility measure on 2 X . 
The set { = PosffXj }) ' I j = 1 n ) is called the set of possibility densities for Pos. 

By definition of the possibility measure, it is clear that the measure of any subset A of X 
can be generated by 

Posf A)= max i p) ), 

Xj 6 A 

and hence, a possibility measure is easily generated by its densities. 

We note that possibility theory can be induced not only from the nested bodies of evidence 
within the Dempster-Shafer theory 14], but also from the fuzzy sets introduced by Zadeh [6], A 
fuzzy set F is a set whose elements are characterized by the membership grade function 
|Xpfx): X — »[0, !]• A value of ppfx) expresses the grade of membership that an element x€ X 
belongs to the fuzzy subset F of X. Let ?tp(x) = p F (x) be a possibility distribution induced by a 
fuzzy set F. In general, a possibility distribution is thought of as an elastic restriction on the 
values within a domain of discourse which a fuzzy variable may assume [5], The fuzzy set F 
provides the meaning of the restriction. A possibility measure is defined as 
Pos(A) = sup[ Kp(x) ] for all A€ 2 X . This relationship holds also for non-normal fuzzy sets 16]. 

xe A 

Although a fuzzy set and a possibility distribution have a common mathematical expression, 
the underlying concepts are different (5]. 

Our possibility measures are non-normalized generalizations of what are referred to as S- 
decomposable measures (7, 8], these being a class of fuzzy measures which are easily 
computable. 

Definition 2.3 Let h(x) be a function such that h: X -»(0. 1], and let Pos(-) be a possibility 
measure of 2 X . The possibility integral or the possibility expectation of h(x) with respect to 
Posh) is defined as 

j h(x) o Pos(*) = sup [ a A Pos( ], where A a = fx I hfx) > a }. 

X 0 € [ 0 , 1 } 


When X = { Xj I 1=1, ...» n } is finite. If we reorder X such that hfcj) > h^) > ... > hfx^, 
then the possibility integral can be written as 


f h(x) o Pos(*) 


D 

= V [ h(xj ) APosfAj ) J, where Aj = { X|, x^ } . 


The rationale of the possibility expectation is to find the source within the universe where 
both the information value h(xj ) and the possibility measure Pos(Aj) are compatibly large, that 
is, where the feasibility of the data and the reliability of a subset of sources is jointly optimal. 

The fuzzy integral developed by Sugeno [1] has the same formulation with the exception 
that a fuzzy measure is used in lieu of the possibility measure. One of the advantages of the 
possibility integral is that the measures Pos(Aj 1 are easily calculated from the densities by the 
recursive relationship 

Pos(Aj) = Pos({xj}) = p 1 ; 

Pos(Aj ) = Pos(Aj_ j U {Xj }) = Pos(Aj. l ) v pi . 

In contrast, for Sugeno fuzzy measure g^ with the fuzzy densities { g 1 g n ), this 

recursive definition becomes 

SjS a 1 ) =8x ({x l }) =8 l; 

gjJ Aj ) = g^(Aj_ j U{Xj }) = g^( Aj- 1 ) + g^ + A. gjg^( Aj.j), 
where X > - 1 [1, 10, 11]. The value of X must be calculated from the equation 

ri ( 1 + Xgi)= l+X, [11. 

1 = 1 

If one Is going to try to learn a measure (iteratively) from training data, the amount of 
computations necessary to learn a possibility measure, and then evaluate its possibility 
integral is considerably less than that required for a Sugeno fuzzy measure and its fuzzy 
Integral. 

For a multiclass pattern recognition problem (or any multicriteria decision making 
problem), the set X represents sources of information (criteria). Each class (alternative) will 
have its own evidence function II: X — >[0. 1] to assess the feasibility that the decision is class i 
(alternative i) from the standpoint of each individual source, Xj . Also, each class will have its 
own possibility measure Po* t which determines the worth of all subsets of sources In deciding 
that a particular object belongs to class i. Finally, the collection of possibility integrals 
e t = hj(x) o Pos,(*), 

gives a class-individualized "fusion" of the direct evidence with the worth of that evidence. A 
final crisp decision can be made from the possibility expectations (integral values), for 
example, pick the class corresponding to the maximum possibility expectation. Alternately, 
these expectation values can used as confidences for later processing. 

3. Properties of The Possibility Integral 

Several interesting properties of the possibility integral are proved in [1 1], Of particular 
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interest to the algorithm presented in the next section are the following two results. 

Theorem 3.1 0 < h(x) o Pos(*) < Pos(X). 

Theorem 3.2 If hj(x)<h 2 W Vx: 

j hjMoPosH < h 2 (x) o Pos(*). ifFos(X) > hi(x) for all x, 

j hjlxJoPosl-) = ^ h 2 WoPos(*), ifPos(X) < hj(x) for all x. 

4. Decision Rule and Training Algorithm 

In the procedure given below, we consider a two class pattern recognition problem, or a two 
alternative decision process. The approach can be extended directly to multiple classes, but 
from the particular structure of the training mechanism, it would be more appropriate to view 
it as a series of two class problems, either as pairwise distinctions, or as each class against all 
of the remaining classes. Since the possibility integral algorithm dose not create geometric 
decision boundaries in feature spaces (as, for example, Bayes Decision Theory), the second 
approach is reasonable and contains fewer subdecisions which need to be made to extend this 
to multiple classes. 

The actual decision algorithm utilizes the nature of the possibility integral to split the 
input objects (as represented by the evidence function h(x) ) into four groups to reduce the 
computational load. The first two groups deal with the case where the strength of all objective 
evidence for one class outweighs that for the other. In most cases, this corresponds to the fact 
that, in a pattern recognition problem, a majority of the data are easily distinguished (being 
quite typical of their class). Decision rules 1 and 2 below are a consequent of Theorem 3.2 
assuming that the possibility measures for both classes in this case are identical. Of course, 
there are problems where the objective evidence for one class can dominate that for the other 
class, and yet, the object belongs to the later. This could happen if the worth of the source, i.e., 
the densities, are vastly different between classes. During training, this condition is 
monitored, and if the training data produce such outcomes, the first two rules are abandoned, 
forcing all training samples to be "conflict data". 

The initial definition of "conflict" is an object where the evidence function for one class 
does not dominate that of the other. In this case, we split the training data (and also the 
unknown test objects) into two subgroups based on the class receiving the highest degree of 
support from any source. For each group, two possibility measures are formed which minimize 
the total misclassification of the training data. The purpose of partitioning the data in this 
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manner is to reduce the size of the training set since our initial training scheme is 
a complete search through a quantized set of all pairs of density functions. To reduce further 
the amount of computations, we note that the value of a possibility integral cannot be larger 
than the maximum of the function being integrated. This fact allows us to restrict the range of 
density values to be no larger than the maximum evidential support in the training set. 
(Reducing the training sets gives more opportunity to invoke this restriction). Optimal pairs of 
density functions (in term of minimal error rate on the training data) are formed and then used 
in the testing cycle. There are 4 possibility measures generated during training - one from each 
class in each of the two subgroups of conflict data. 

The decision algorithm is summarized below. 

BEGIN 

FOR each feature data vector DO obtain hj(xj ) for all J and l» 2 (xj ) for all j; 

(1) IF h j (xj ) > h 2 (xj) for all j, THEN the feature data vector belongs to class 1. 

(2) ELSE IF h j (xj ) < lvjbcj) for all j, THEN the feature data vector belongs to class 2. 

(3) ELSE 

If Vhi(xi)> Vh 2 (xi ), Then 

j=i J j-i J 

ej= Vlhjfxj) A Pos u (Aj)l, * 2 = V[h 2 (Xj) A Pos 12 (Aj)] 

Else 

Cj = Vlh^Xj) A P0S2i(Aj)], * 2 = V[h 2 (Xj) A POS22(Aj)1 

End If 

If e, > , Then the feature data vector belongs to class 1 , 

Else the the feature data vector belongs to class 2. 

End If 
END IF 
END FOR 
END . 

5. Experimental Results 

Two shuttle-earth-space intensity images were used in the exp eriment, in which all the 
data from the two images were treated as "conflict data' and hence only the third decision rule 
applies. 

The training image is shown in Fig. 5.1 and the test image is shown in Fig. 5.5. Three 
texture feature images (contrast, difference, and the entropy) were derived from the training 


and the test images respectively, i.e., three feature images for training and three feature images 
for testing (For the definition of these features, please see section on membership generation 
techniques in this report). The three feature images, used for training the possibility densities, 
are shown in Fig. 5.2. The three feature images used in testing are shown in Fig. 5.6. 

The possibility distribution (or membership function) of each class in each feature, that 
used to generate the evidential function h(x), is determined by using the possibilistic clustering 
algorithm on the histograms of each class in each feature, which is described in another 
section of this report. 

While training, the possibility densities were determined with the “perceptron criterion” 
(i.e., minimize the decision error) from the feature images in Fig. 5.2. The segmentation result 
corresponding to the possibility measure(s) for the training image is shown in Fig. 5.3, in 
which the shuttle and its background are clearly segmented, except that the shuttle body seems 
disconnected. To improve the connection of the shuttle body, the possibility densities of the 
shuttle were raised slightly, from which the segmentation result in Fig. 5.4 and the result in 
Fig. 5.7 (for the test case) were obtained. These results can be improved quite easily with a 
shrink-expand operation. 

6. Conclusion 

In this paper, a decision making algorithm based on a variation of the fuzzy integral was 
proposed. The possibility integral has a particularly simple generation capability. The 
algorithm was run on the shuttle-earch-space images, reasonable good results were obtained. 
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Fig 5.1 Intensity training image. 





Fig 5.2 (lop left) Intensity training image. 

(top right) Contrast feature image, 
(bottom left) Difference feature image, 
(bouom right) Entropy feature image. 





Fig 5.3 Segmented image 1 using the possibility integral algorithm. 



Fig 5.4 Segmented image2 using the possibility integral algorithm. 







Fig 5.6 (top left) Intensity testing image. 

(top right) Contrast feature image, 
(bottom left) Difference feature image, 
(bottom right) Entropy feature image. 





Calculation of Membership Functions 


Our work in ibis area has progressed nicely. We have designed and implemented a 
new algorithm to generate membership values from a set of training data using a multi-layer 
neural network. This is in addition to the progress we made in the transformation of 
"probability density functions" into possibility distributions for use in assigning 
membership values to individual points as reported in the third quarter report. 



